Reconstructing Websites for the Lazy Webmaster

نویسندگان

  • Frank McCown
  • Joan A. Smith
  • Michael L. Nelson
  • Johan Bollen
چکیده

Backup or preservation of websites is often not considered until after a catastrophic event has occurred. In the face of complete website loss, “lazy” webmasters or concerned third parties may be able to recover some of their website from the Internet Archive. Other pages may also be salvaged from commercial search engine caches. We introduce the concept of “lazy preservation”digital preservation performed as a result of the normal operations of the Web infrastructure (search engines and caches). We present Warrick, a tool to automate the process of website reconstruction from the Internet Archive, Google, MSN and Yahoo. Using Warrick, we have reconstructed 24 websites of varying sizes and composition to demonstrate the feasibility and limitations of website reconstruction from the public Web infrastructure. To measure Warrick’s window of opportunity, we have profiled the time required for new Web resources to enter and leave search engine caches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Website Reconstruction using the Web Infrastructure

Backup or preservation of websites is often not considered until after a catastrophic event has occurred. In the face of complete website loss, webmasters or concerned third parties may be able to recover some of their website from the Internet Archive. Other pages may also be salvaged from commercial search engine (SE) caches if caught in time. We introduce the concept of “lazy preservation”di...

متن کامل

Website Forensic Investigation to Identify Evidence and Impact of Compromise

Compromised websites that redirect users to malicious websites are often used by attackers to distribute malware. These attackers compromise popular websites and integrate them into a drive-by download attack scheme to lure unsuspecting users to malicious websites. An incident response organization such as a CSIRT contributes to preventing the spread of malware infection by analyzing compromise...

متن کامل

An Introduction to Implicit Invocation Architectures

ColdFusion's initial appeal was to "webmasters" who wanted to make their sites more dynamic. It succeeded admirably. But just as the term, webmaster, is an anachronism, the call for more dynamic websites has been succeeded by the need for true web applications. As these applications become more involved and more ambitious in scope, ColdFusion developers find that a thorough knowledge of tags an...

متن کامل

Ana and the Internet: a review of pro-anorexia websites.

OBJECTIVE The purpose of this article is to describe the content of pro-anorexia websites, both qualitatively and quantitatively. METHOD An Internet search protocol was developed to identify pro-anorexia websites. A grounded theory approach was used to generate themes from Internet-based information. Basic descriptive analysis was employed to report on key website characteristics. RESULTS T...

متن کامل

Websites of Indian Institutes of Technology: a Webometric Study

The study explored different characteristics of linking analysis of sixteen IIT websites. All the IITs have their own websites and all websites working under homogeneous Domain Name System (DNS) “.ac.in”. The comparisons of ranking of Indian Institutes of Technology (IITs) have been done using WISER, WIF (inlink) and World Rank. The WISER ranking and WIF (in-link) is having correlation i.e. +0....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/cs/0512069  شماره 

صفحات  -

تاریخ انتشار 1984